fix(webhook): merge dataset annotations in multi-round FUSE injection#5974
Conversation
Signed-off-by: Sukuna0007Abhi <appsonly310@gmail.com>
|
Hi @Sukuna0007Abhi. Thanks for your PR. I'm waiting for a fluid-cloudnative member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Code Review
This pull request updates the DatasetUsageInjector to support multi-round injection by merging existing dataset usage annotations with new ones instead of overwriting them. The review feedback points out a potential issue where malformed or empty values in existing annotations could lead to empty strings in the final annotation list, and suggests filtering out empty strings after trimming.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
pls review @cheyang @TrafalgarZZZ @xliuqq |
cheyang
left a comment
There was a problem hiding this comment.
Thanks for picking this up, @Sukuna0007Abhi — nice first contribution!
The approach looks solid: reading the existing annotation, deduplicating via a map, and writing back the sorted union. A few observations:
What looks good:
- Nil-map access on
pod.Annotationsis safe in Go (returns zero value), so the read path is correct even when annotations haven't been initialized yet. - The
len(existingVal) > 0guard prevents splitting an empty string into[""]. - Deduplication + sorted output ensures idempotent behavior across rounds — nice.
- The TODO comment removal is appropriate now that the issue is addressed.
Minor nit (non-blocking):
When parsing the existing annotation value, strings.TrimSpace(ds) could produce an empty string if the stored value is somehow malformed (e.g., trailing comma: "a,b,"). Adding a quick filter would make it fully defensive:
for _, ds := range existingDatasets {
trimmed := strings.TrimSpace(ds)
if trimmed != "" {
datasetsInUseMap[trimmed] = struct{}{}
}
}In practice this is unlikely since this code is the only writer of the annotation, but it's a cheap guard. Up to you whether to address it — won't block on this.
Tests: The updated test correctly validates the merge scenario. If you're feeling ambitious, a test for duplicate detection (same dataset already in annotation + in current runtimeInfos) would round things out, but what's here is adequate.
Overall this is a clean, well-scoped fix. LGTM from a code-correctness standpoint.
/ok-to-test
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #5974 +/- ##
==========================================
+ Coverage 63.54% 63.57% +0.03%
==========================================
Files 478 479 +1
Lines 33217 33290 +73
==========================================
+ Hits 21107 21165 +58
- Misses 10432 10445 +13
- Partials 1678 1680 +2 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Thanks for the review, thanks for approving and merging 🙌 |
|
Pls approve @cheyang @TrafalgarZZZ |
There was a problem hiding this comment.
Pull request overview
Fixes the DatasetUsageInjector mutating webhook so multi-round FUSE sidecar injection merges into the existing fluid.io/datasets-in-use annotation instead of overwriting it, preserving dataset usage tracking across sequential mutations.
Changes:
- Merge existing
datasets-in-useannotation value with newly injected dataset names via a deduplicating map and sorted output. - Remove the prior TODO by implementing the previously missing multi-round behavior.
- Update the Ginkgo test to assert merge behavior rather than overwrite behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| pkg/webhook/plugins/datasetusageinjector/dataset_usage_injector.go | Implements merge/dedupe/sort when writing the datasets-in-use annotation. |
| pkg/webhook/plugins/datasetusageinjector/dataset_usage_injector_test.go | Updates unit test expectations to validate merged annotation output. |
Comments suppressed due to low confidence (1)
pkg/webhook/plugins/datasetusageinjector/dataset_usage_injector_test.go:176
- The updated test only covers merging when the existing annotation contains a single dataset. Since the new logic also parses comma-separated values, trims whitespace, and deduplicates via a map, add coverage for (1) existing annotation with multiple entries / whitespace and (2) a duplicate dataset across rounds to ensure the merge logic behaves as intended.
It("should merge the existing annotation with the new one", func() {
pod := &corev1.Pod{
ObjectMeta: metav1.ObjectMeta{
Name: "test-update",
Namespace: "fluid-test",
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Summary: Clean, well-scoped fix that correctly addresses the multi-round FUSE injection annotation overwrite bug. The approach is sound: read existing annotation values into a map for deduplication, merge new datasets, sort, and write back. Key observations:
Non-blocking suggestion: Filter out empty strings after Test coverage note: The updated test validates the merge scenario well. A test for partial-overlap deduplication (annotation has datasets A and B, runtimeInfos has B and C, expect A,B,C) would round things out but is not required. No blocking issues found. The fix is correct and ready for merge. Review by fluid-cloudnative/fluid agent reviewer |
…partial overlap test Signed-off-by: Sukuna0007Abhi <appsonly310@gmail.com>
|
|
@Sukuna0007Abhi Thanks for fixing this! There's a lint failure in current branch. Could you help also fix that? And we can merge it. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cheyang, TrafalgarZZZ The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Thank you @cheyang @TrafalgarZZZ
Sorry I was bit offline in that time, but it's merged now thanks, I think the lint got fixed or if not, I will raise another pr, Thanks again @cheyang @TrafalgarZZZ |



Ⅰ. Describe what this PR does
This PR fixes an edge case in the DatasetUsageInjector webhook where it was completely overwriting the fluid.io/dataset-in-use annotation instead of merging it.
I noticed there was a // TODO comment left in the codebase mentioning that multi-round sidecar injection wasn't fully supported yet. So, if a Pod got mutated in multiple rounds (for example, injecting different datasets sequentially), the webhook would wipe out the datasets from earlier rounds and only keep the last one.
To fix this, I updated the logic to check if the annotation already exists. If it does, we parse the existing comma-separated string, merge it with the new datasets using a Go map to avoid duplicates, sort them, and safely update the annotation.
Ⅱ. Does this pull request fix one issue?
fixes #5973
Ⅲ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.
I updated the Ginkgo unit tests in pkg/webhook/plugins/datasetusageinjector/dataset_usage_injector_test.go. Specifically, I modified the Context("when pod has different annotation value") block to explicitly verify that the old dataset and the new dataset are correctly merged into a single, sorted comma-separated string, rather than being overwritten.
Ⅳ. Describe how to verify it
You can verify this by running the standard webhook test suite: go test -v ./pkg/webhook/plugins/datasetusageinjector/...
To manually verify, you can create a Pod that requests multiple datasets which trigger sequential mutations, and then check the Pod's annotations using kubectl get pod -o yaml. The fluid.io/dataset-in-use annotation will now contain all the injected datasets.
Ⅴ. Special notes for reviews
The string manipulation and deduplication logic was kept as lightweight as possible to avoid adding any overhead to the admission controller's critical path.